In this project, we are using a dataset of songs on the music streaming app Spotify.
The dataset contains songs on Spotify across multiple genres, and we will be performing several analyses to this dataset, such as basic descriptive and bivariate statistics, Principal Component Analysis, decision trees, regression, and clustering.
Here is the link to the original dataset
First, we import the data to R and make sure R is reading the data properly.
# importing relevant libraries to perform cleaning on the data
library(tidyverse)
library(janitor)
setwd("~/Documents/class/stats-final-project/")
# importing the data and cleaning the names into a snake_case format.
raw_data <- read.csv("dataset.csv") %>% clean_names()
The dataset has 114,000 rows and 21 columns/variables.
It has the following scores (numerical variables): * popularity: The popularity of a track is a value between 0 and 100, with 100 being the most popular. - duration_ms: The track length in milliseconds - danceability: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable - energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale - loudness: The overall loudness of a track in decibels (dB) - speechiness: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. - acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic - instrumentalness: Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content - liveness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live - valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry) - tempo: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration
The dataset also has the following categorical variables: - explicit: Whether or not the track has explicit lyrics (true = yes it does; false = no it does not OR unknown) - mode: Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0 - key: The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1 - track_genre: The genre in which the track belongs
It also has the following columns that describe the songs: - track_id: The Spotify ID for the track - artists: The artists’ names who performed the track. If there is more than one artist, they are separated by a ; - album_name: The album name in which the track appears - track_name: Name of the track
Now we are performing several checks on the data
#dim(raw_data)
#names(raw_data)
#sapply(raw_data, class)
We’ll perform the following transforms to the data in order to prepare it for our analysis:
Since some of the categorical variables - mode, key, and time_signature - are currently codified as numerical, we will do the following: 1. key: We will be converting them from numbers 0 to 11 to the letter value of the key (C = 0, C# = 1, etc.) 2. mode: Instead of 0 for minor and 1 for major, we’ll convert them to “major” and “minor”. 3. time_signature: We’ll convert into characters instead of numeric.
We are also adding 2 more variables, which are: 1. multiple_artists: If there are multiple artists performing the track, the artists column will contain all artists separated by a semicolon (;). We’ll add a true value in this column if there are multiple artists, and false if a single artist. 2. tempo_cat: this is a categorical variable based on the tempo column. We’ll use the beats per minute to determine which tempo marking it fits in. This will be an ordinal variable, with the levels defined.
We are also performing several filters to scope our analysis: 1.
Filter to songs that are done by popular artists. This is done by
finding the artists that have 20 songs or more and filtering to just the
songs by those artists. 2. We also scope the analysis to just songs that
are less than 10 minutes. 3. We’ll also remove the duplicated songs.
This is because some songs are listed in albums or single versions. This
is done by removing songs that have the same variable in the track_name
and artists columns. 4. We also sample the data to just 3,000
rows/songs. This is done by random sampling using the
sample_n() function.
Finally, we’ll just select the columns that are relevant to us in our analysis, and remove the descriptive columns, track_id, artists, album_name, and track_name.
dd```
Error: attempt to use zero-length variable name
After seeing the basic descriptive statistics of the data, we’ll do a bivariate statistics analysis. The purpose is to find relationships between: 1. Categorical vs categorical variables 2. Categorical vs numerical variables 3. Numerical vs numerical variables
We examine the relationship between tempo marking and mode.
library(ggplot2)
# stacked bar chart
ggplot(dd,
aes(x = tempo_cat,
fill = mode)) +
geom_bar(position = "stack")
ggplot(dd,
aes(x = tempo_cat,
fill = mode)) +
geom_bar(position = "dodge")
ggplot(dd,
aes(x = tempo_cat,
fill = mode)) +
geom_bar(position = "fill")
From the above bar charts, we see that while there are more songs in the major key, some tempo markings have a higher proportion of minor songs than others. Larghetto and Vivace songs are the two tempo markings that have the highest proportion of minor songs, which is interesting because Larghetto is on the slower end, and Vivace is on the faster end.
Next, we can also examine the relationship between mode and explicitness.
# stacked bar chart
ggplot(dd,
aes(x = explicit,
fill = mode)) +
geom_bar(position = "stack")
ggplot(dd,
aes(x = explicit,
fill = mode)) +
geom_bar(position = "dodge")
ggplot(dd,
aes(x = explicit,
fill = mode)) +
geom_bar(position = "fill")
From the plots above: there aren’t many songs that are explicit, and it is hard to tell the relationship. The proportion of minor songs for songs that are explicit is slightly higher than major songs. However, it’s a very minimal difference.
We can also examine the relationship between track_genre and mode.
# stacked bar chart
ggplot(dd,
aes(x = track_genre,
fill = mode)) +
geom_bar(position = "stack") + scale_x_discrete(guide = guide_axis(angle = 90))
ggplot(dd,
aes(x = track_genre,
fill = mode)) +
geom_bar(position = "dodge") + scale_x_discrete(guide = guide_axis(angle = 90))
ggplot(dd,
aes(x = track_genre,
fill = mode)) +
geom_bar(position = "fill") + scale_x_discrete(guide = guide_axis(angle = 90)) +
theme(legend.key.size = unit(0.5, 'cm'), #change legend key size
legend.key.height = unit(0.5, 'cm'), #change legend key height
legend.key.width = unit(0.5, 'cm'), #change legend key width
legend.title = element_text(size=5), #change legend title font size
legend.text = element_text(size=5)) +
theme(text = element_text(size = 7))
# getting the results of the proportion bar chart in a table
dd %>% group_by(track_genre, mode) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n)) %>%
filter(mode == "minor") %>%
arrange(desc(freq))
`summarise()` has grouped output by 'track_genre'. You can override using the `.groups` argument.
NA
We can see there are some genres that clearly stand out that have majority minor songs. All latin songs are in the minor key. Synth-pop, turkish, trance, dancehall, romance, spanish, anime and hiphop songs are also among the top 10 genres with a high proportion of minor songs.
So, there seem to be a relationship between genre and mode.
# stacked bar chart
ggplot(dd,
aes(x = track_genre,
fill = explicit)) +
geom_bar(position = "stack") + scale_x_discrete(guide = guide_axis(angle = 90))
ggplot(dd,
aes(x = track_genre,
fill = explicit)) +
geom_bar(position = "dodge") + scale_x_discrete(guide = guide_axis(angle = 90))
ggplot(dd,
aes(x = track_genre,
fill = explicit)) +
geom_bar(position = "fill") + scale_x_discrete(guide = guide_axis(angle = 90)) +
theme(legend.key.size = unit(0.5, 'cm'), #change legend key size
legend.key.height = unit(0.5, 'cm'), #change legend key height
legend.key.width = unit(0.5, 'cm'), #change legend key width
legend.title = element_text(size=5), #change legend title font size
legend.text = element_text(size=5)) +
theme(text = element_text(size = 7))
# getting the results of the proportion bar chart in a table
dd %>% group_by(track_genre, explicit) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n)) %>%
filter(explicit == "True") %>%
arrange(desc(freq))
`summarise()` has grouped output by 'track_genre'. You can override using the `.groups` argument.
Explicit and genre also seems to be related, as songs that are explicit tend to be from some genres. Latino songs are 100% explicit. Comedy, country, dance and some of the metal genres also tend to contain swear words.
Next we can examine the relationship between key and time signature.
# stacked bar chart
ggplot(dd,
aes(x = key,
fill = mode)) +
geom_bar(position = "stack")
ggplot(dd,
aes(x = key,
fill = mode)) +
geom_bar(position = "dodge")
ggplot(dd,
aes(x = key,
fill = mode)) +
geom_bar(position = "fill")
# getting the results of the proportion bar chart in a table
dd %>% group_by(key, mode) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n)) %>%
filter(mode == "minor") %>%
arrange(desc(freq))
`summarise()` has grouped output by 'key'. You can override using the `.groups` argument.
The key of B has the highest proportion of minor songs. F#/Gb, E and A#/Bb also have a relatively higher percentage of minor songs.
We can also perform analysis on categorical vs numerical variables by using charts such as multiple boxplots to plot the distribution of one numerical variable given another categorical variable.
We have used the code by Dr. Karina Gibert to do an overview of the
variables, and below are the interesting plots in
ggplot()
First, the code below creates functions to test numerical and qualitative variables.
#Calcula els valor test de la variable Xnum per totes les modalitats del factor P
ValorTestXnum <- function(Xnum,P){
#freq dis of fac
nk <- as.vector(table(P));
n <- sum(nk);
#mitjanes x grups
xk <- tapply(Xnum,P,mean);
#valors test
txk <- (xk-mean(Xnum))/(sd(Xnum)*sqrt((n-nk)/(n*nk)));
#p-values
pxk <- pt(txk,n-1,lower.tail=F);
for(c in 1:length(levels(as.factor(P)))){if (pxk[c]>0.5){pxk[c]<-1-pxk[c]}}
return (pxk)
}
ValorTestXquali <- function(P,Xquali){
taula <- table(P,Xquali);
n <- sum(taula);
pk <- apply(taula,1,sum)/n;
pj <- apply(taula,2,sum)/n;
pf <- taula/(n*pk);
pjm <- matrix(data=pj,nrow=dim(pf)[1],ncol=dim(pf)[2], byrow=TRUE);
dpf <- pf - pjm;
dvt <- sqrt(((1-pk)/(n*pk))%*%t(pj*(1-pj)));
#i hi ha divisions iguals a 0 dona NA i no funciona
zkj <- dpf
zkj[dpf!=0]<-dpf[dpf!=0]/dvt[dpf!=0];
pzkj <- pnorm(zkj,lower.tail=F);
for(c in 1:length(levels(as.factor(P)))){for (s in 1:length(levels(Xquali))){if (pzkj[c,s]> 0.5){pzkj[c,s]<-1- pzkj[c,s]}}}
return (list(rowpf=pf,vtest=zkj,pval=pzkj))
}
Let’s run the profiling script for the mode variable
#Data is referred to as "dades" in the following code
dades<-dd
K<-dim(dades)[2]
#par(ask=TRUE)
#P must contain the class variable
P<-dd$mode
nameP<-"mode"
nc<-length(levels(as.factor(P)))
pvalk <- matrix(data=0,nrow=nc,ncol=K, dimnames=list(levels(P),names(dades)))
nameP<-"mode"
n<-dim(dades)[1]
for(k in 1:K){
if (is.numeric(dades[,k])){
print(paste("Analysis by class of the Variable:", names(dades)[k]))
boxplot(dades[,k]~P, main=paste("Boxplot of", names(dades)[k], "vs", nameP ), horizontal=TRUE)
barplot(tapply(dades[[k]], P, mean),main=paste("Means of", names(dades)[k], "by", nameP ))
abline(h=mean(dades[[k]]))
legend(0,mean(dades[[k]]),"global mean",bty="n")
print("Statistics by groups:")
for(s in levels(as.factor(P))) {print(summary(dades[P==s,k]))}
o<-oneway.test(dades[,k]~P)
print(paste("p-valueANOVA:", o$p.value))
kw<-kruskal.test(dades[,k]~P)
print(paste("p-value Kruskal-Wallis:", kw$p.value))
pvalk[,k]<-ValorTestXnum(dades[,k], P)
print("p-values ValorsTest: ")
print(pvalk[,k])
}else{
if(class(dd[,k])=="Date"){
print(summary(dd[,k]))
print(sd(dd[,k]))
#decide breaks: weeks, months, quarters...
hist(dd[,k],breaks="weeks")
}else{
#qualitatives
print(paste("Variable", names(dades)[k]))
table<-table(P,dades[,k])
# print("Cross-table")
# print(table)
rowperc<-prop.table(table,1)
colperc<-prop.table(table,2)
# print("Distribucions condicionades a files")
# print(rowperc)
#ojo porque si la variable es true o false la identifica amb el tipus Logical i
#aquest no te levels, por tanto, coertion preventiva
dades[,k]<-as.factor(dades[,k])
marg <- table(as.factor(P))/n
print(append("Categories=",levels(as.factor(dades[,k]))))
#from next plots, select one of them according to your practical case
#with legend
plot(marg,type="l",ylim=c(0,1),main=paste("Prop. of pos & neg by",names(dades)[k]))
paleta<-rainbow(length(levels(dades[,k])))
for(c in 1:length(levels(dades[,k]))){lines(colperc[,c],col=paleta[c]) }
legend("topright", levels(dades[,k]), col=paleta, lty=2, cex=0.6)
#condicionades a classes
#with legend
plot(marg,type="n",ylim=c(0,1),main=paste("Prop. of pos & neg by",names(dades)[k]))
paleta<-rainbow(length(levels(dades[,k])))
for(c in 1:length(levels(dades[,k]))){lines(rowperc[,c],col=paleta[c]) }
legend("topright", levels(dades[,k]), col=paleta, lty=2, cex=0.6)
#amb variable en eix d'abcisses
marg <-table(dades[,k])/n
print(append("Categories=",levels(dades[,k])))
#with legend
plot(marg,type="l",ylim=c(0,1),main=paste("Prop. of pos & neg by",names(dades)[k]), las=3)
for(c in 1:length(levels(as.factor(P)))){lines(rowperc[c,],col=paleta[c])}
legend("topright", levels(as.factor(P)), col=paleta, lty=2, cex=0.6)
#condicionades a columna
#with legend
plot(marg,type="n",ylim=c(0,1),main=paste("Prop. of pos & neg by",names(dades)[k]), las=3)
for(c in 1:length(levels(as.factor(P)))){lines(colperc[c,],col=paleta[c])}
legend("topright", levels(as.factor(P)), col=paleta, lty=2, cex=0.6)
table<-table(dades[,k],P)
print("Cross Table:")
print(table)
print("Distribucions condicionades a columnes:")
print(colperc)
#diagrames de barres apilades
paleta<-rainbow(length(levels(dades[,k])))
barplot(table(dades[,k], as.factor(P)), beside=FALSE,col=paleta )
legend("topright",levels(as.factor(dades[,k])),pch=1,cex=0.5, col=paleta)
#diagrames de barres adosades
barplot(table(dades[,k], as.factor(P)), beside=TRUE,col=paleta)
legend("topright",levels(as.factor(dades[,k])),pch=1,cex=0.5, col=paleta)
print("Test Chi quadrat: ")
print(chisq.test(dades[,k], as.factor(P)))
print("valorsTest:")
print( ValorTestXquali(P,dades[,k]))
#calcular els pvalues de les quali
}
}
}#endfor
[1] "Analysis by class of the Variable: popularity"
[1] "Statistics by groups:"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 20.00 31.00 33.29 46.00 93.00
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 17.00 29.00 31.73 46.00 97.00
[1] "p-valueANOVA: 0.0400221586551545"
[1] "p-value Kruskal-Wallis: 0.0120068574469301"
[1] "p-values ValorsTest: "
[1] 0.01756552 0.01756552
[1] "Analysis by class of the Variable: duration_ms"
[1] "Statistics by groups:"
Min. 1st Qu. Median Mean 3rd Qu. Max.
28946 164319 213993 220668 265283 594533
Min. 1st Qu. Median Mean 3rd Qu. Max.
30622 171994 221756 231206 277523 579546
[1] "p-valueANOVA: 0.00239712152994276"
[1] "p-value Kruskal-Wallis: 0.00437920349247436"
[1] "p-values ValorsTest: "
[1] 0.0009370723 0.0009370723
[1] "Variable explicit"
[1] "Categories=" "False" "True"
[1] "Categories=" "False" "True"
[1] "Cross Table:"
P
major minor
False 1899 913
True 123 65
[1] "Distribucions condicionades a columnes:"
P False True
major 0.6753201 0.6542553
minor 0.3246799 0.3457447
[1] "Test Chi quadrat: "
Pearson's Chi-squared test with Yates' continuity correction
data: dades[, k] and as.factor(P)
X-squared = 0.26645, df = 1, p-value = 0.6057
[1] "valorsTest:"
$rowpf
Xquali
P False True
major 0.93916914 0.06083086
minor 0.93353783 0.06646217
$vtest
Xquali
P False True
major 0.5965451 -0.5965451
minor -0.5965451 0.5965451
$pval
Xquali
P False True
major 0.2754056 0.2754056
minor 0.2754056 0.2754056
[1] "Analysis by class of the Variable: danceability"
[1] "Statistics by groups:"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.4290 0.5460 0.5412 0.6637 0.9750
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.4243 0.5570 0.5405 0.6690 0.9530
[1] "p-valueANOVA: 0.919396530566418"
[1] "p-value Kruskal-Wallis: 0.542364091950772"
[1] "p-values ValorsTest: "
[1] 0.4592491 0.4592491
[1] "Analysis by class of the Variable: energy"
[1] "Statistics by groups:"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000242 0.401250 0.635000 0.608990 0.851000 0.999000
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00983 0.48900 0.71100 0.65809 0.87475 0.99900
[1] "p-valueANOVA: 1.42880872470579e-06"
[1] "p-value Kruskal-Wallis: 3.11239888728683e-06"
[1] "p-values ValorsTest: "
[1] 1.116477e-06 1.116477e-06
[1] "Variable key"
[1] "Categories=" "A" "A#/Bb" "B" "C" "C#/Db" "D"
[8] "D#/Eb" "E" "F" "F#/Gb" "G" "G#/Ab"
[1] "Categories=" "A" "A#/Bb" "B" "C" "C#/Db" "D"
[8] "D#/Eb" "E" "F" "F#/Gb" "G" "G#/Ab"
[1] "Cross Table:"
P
major minor
A 202 124
A#/Bb 106 81
B 102 119
C 291 75
C#/Db 170 80
D 266 80
D#/Eb 66 24
E 149 120
F 151 101
F#/Gb 87 73
G 315 68
G#/Ab 117 33
[1] "Distribucions condicionades a columnes:"
P A A#/Bb B C C#/Db D D#/Eb E F F#/Gb
major 0.6196319 0.5668449 0.4615385 0.7950820 0.6800000 0.7687861 0.7333333 0.5539033 0.5992063 0.5437500
minor 0.3803681 0.4331551 0.5384615 0.2049180 0.3200000 0.2312139 0.2666667 0.4460967 0.4007937 0.4562500
P G G#/Ab
major 0.8224543 0.7800000
minor 0.1775457 0.2200000
[1] "Test Chi quadrat: "
Pearson's Chi-squared test
data: dades[, k] and as.factor(P)
X-squared = 182.12, df = 11, p-value < 2.2e-16
[1] "valorsTest:"
$rowpf
Xquali
P A A#/Bb B C C#/Db D D#/Eb E F
major 0.09990109 0.05242334 0.05044510 0.14391691 0.08407517 0.13155292 0.03264095 0.07368942 0.07467854
minor 0.12678937 0.08282209 0.12167689 0.07668712 0.08179959 0.08179959 0.02453988 0.12269939 0.10327198
Xquali
P F#/Gb G G#/Ab
major 0.04302671 0.15578635 0.05786350
minor 0.07464213 0.06952965 0.03374233
$vtest
Xquali
P A A#/Bb B C C#/Db D D#/Eb E F
major -2.2181664 -3.2282754 -7.0009031 5.2739257 0.2113863 3.9990262 1.2192574 -4.4042106 -2.6465403
minor 2.2181664 3.2282754 7.0009031 -5.2739257 -0.2113863 -3.9990262 -1.2192574 4.4042106 2.6465403
Xquali
P F#/Gb G G#/Ab
major -3.6124383 6.6360891 2.8415215
minor 3.6124383 -6.6360891 -2.8415215
$pval
Xquali
P A A#/Bb B C C#/Db D D#/Eb
major 1.327175e-02 6.226950e-04 1.271538e-12 6.676800e-08 4.162929e-01 3.180182e-05 1.113733e-01
minor 1.327175e-02 6.226950e-04 1.271589e-12 6.676800e-08 4.162929e-01 3.180182e-05 1.113733e-01
Xquali
P E F F#/Gb G G#/Ab
major 5.308488e-06 4.065990e-03 1.516657e-04 1.610576e-11 2.244941e-03
minor 5.308488e-06 4.065990e-03 1.516657e-04 1.610578e-11 2.244941e-03
[1] "Analysis by class of the Variable: loudness"
[1] "Statistics by groups:"
Min. 1st Qu. Median Mean 3rd Qu. Max.
-37.417 -11.354 -7.519 -8.891 -5.186 0.377
Min. 1st Qu. Median Mean 3rd Qu. Max.
-42.631 -10.707 -7.438 -8.883 -5.177 -1.134
[1] "p-valueANOVA: 0.971118589129941"
[1] "p-value Kruskal-Wallis: 0.592448123464342"
[1] "p-values ValorsTest: "
[1] 0.4853843 0.4853843
[1] "Variable mode"
[1] "Categories=" "major" "minor"
[1] "Categories=" "major" "minor"
[1] "Cross Table:"
P
major minor
major 2022 0
minor 0 978
[1] "Distribucions condicionades a columnes:"
P major minor
major 1 0
minor 0 1
[1] "Test Chi quadrat: "
Pearson's Chi-squared test with Yates' continuity correction
data: dades[, k] and as.factor(P)
X-squared = 2995.5, df = 1, p-value < 2.2e-16
[1] "valorsTest:"
$rowpf
Xquali
P major minor
major 1 0
minor 0 1
$vtest
Xquali
P major minor
major 54.77226 -54.77226
minor -54.77226 54.77226
$pval
Xquali
P major minor
major 0 0
minor 0 0
[1] "Analysis by class of the Variable: speechiness"
[1] "Statistics by groups:"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00000 0.03380 0.04490 0.07438 0.07130 0.96200
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00000 0.03712 0.05025 0.08738 0.08165 0.95500
[1] "p-valueANOVA: 0.00233148377089234"
[1] "p-value Kruskal-Wallis: 6.65362573251419e-08"
[1] "p-values ValorsTest: "
[1] 0.0006716785 0.0006716785
[1] "Analysis by class of the Variable: acousticness"
[1] "Statistics by groups:"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000001 0.018950 0.270000 0.368455 0.703750 0.996000
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000011 0.0055125 0.1380000 0.3071756 0.6167500 0.9960000
[1] "p-valueANOVA: 7.06558400528732e-06"
[1] "p-value Kruskal-Wallis: 1.31911871199142e-06"
[1] "p-values ValorsTest: "
[1] 3.584032e-06 3.584032e-06
[1] "Analysis by class of the Variable: instrumentalness"
[1] "Statistics by groups:"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000000 0.0000000 0.0000328 0.1591862 0.0473750 1.0000000
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000000 0.0000000 0.0008475 0.2497380 0.6480000 1.0000000
[1] "p-valueANOVA: 9.57965588130597e-11"
[1] "p-value Kruskal-Wallis: 4.72350468857676e-13"
[1] "p-values ValorsTest: "
[1] 3.990808e-12 3.990828e-12
[1] "Analysis by class of the Variable: liveness"
[1] "Statistics by groups:"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0204 0.1020 0.1425 0.2395 0.3068 0.9920
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0112 0.0991 0.1385 0.2346 0.3068 0.9890
[1] "p-valueANOVA: 0.557457582036754"
[1] "p-value Kruskal-Wallis: 0.4249753015773"
[1] "p-values ValorsTest: "
[1] 0.2796026 0.2796026
[1] "Analysis by class of the Variable: valence"
[1] "Statistics by groups:"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.2590 0.4880 0.4860 0.7007 0.9830
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.2190 0.4470 0.4588 0.6797 0.9850
[1] "p-valueANOVA: 0.00926221933990095"
[1] "p-value Kruskal-Wallis: 0.0079740097954662"
[1] "p-values ValorsTest: "
[1] 0.004419935 0.004419935
[1] "Analysis by class of the Variable: tempo"
[1] "Statistics by groups:"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0 100.7 122.4 122.9 140.0 209.1
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 98.19 120.12 119.87 137.69 219.97
[1] "p-valueANOVA: 0.00884088440664359"
[1] "p-value Kruskal-Wallis: 0.00992439951138771"
[1] "p-values ValorsTest: "
[1] 0.004591509 0.004591509
[1] "Variable time_signature"
[1] "Categories=" "0" "1" "3" "4" "5"
[1] "Categories=" "0" "1" "3" "4" "5"
[1] "Cross Table:"
P
major minor
0 2 2
1 21 9
3 187 84
4 1772 864
5 40 19
[1] "Distribucions condicionades a columnes:"
P 0 1 3 4 5
major 0.5000000 0.7000000 0.6900369 0.6722307 0.6779661
minor 0.5000000 0.3000000 0.3099631 0.3277693 0.3220339
[1] "Test Chi quadrat: "
Warning: Chi-squared approximation may be incorrect
Pearson's Chi-squared test
data: dades[, k] and as.factor(P)
X-squared = 1.0024, df = 4, p-value = 0.9094
[1] "valorsTest:"
$rowpf
Xquali
P 0 1 3 4 5
major 0.0009891197 0.0103857567 0.0924826904 0.8763600396 0.0197823937
minor 0.0020449898 0.0092024540 0.0858895706 0.8834355828 0.0194274029
$vtest
Xquali
P 0 1 3 4 5
major -0.74289976 0.30533573 0.59050719 -0.55636144 0.06563934
minor 0.74289976 -0.30533573 -0.59050719 0.55636144 -0.06563934
$pval
Xquali
P 0 1 3 4 5
major 0.2287712 0.3800552 0.2774253 0.2889819 0.4738325
minor 0.2287712 0.3800552 0.2774253 0.2889819 0.4738325
[1] "Variable track_genre"
[1] "Categories=" "acoustic" "afrobeat" "alt-rock" "alternative"
[6] "ambient" "anime" "black-metal" "bluegrass" "blues"
[11] "brazil" "breakbeat" "british" "cantopop" "chicago-house"
[16] "children" "chill" "classical" "club" "comedy"
[21] "country" "dance" "dancehall" "death-metal" "detroit-techno"
[26] "disco" "disney" "drum-and-bass" "edm" "electro"
[31] "electronic" "emo" "folk" "forro" "garage"
[36] "german" "gospel" "goth" "grindcore" "groove"
[41] "grunge" "guitar" "happy" "hard-rock" "hardcore"
[46] "heavy-metal" "hip-hop" "honky-tonk" "idm" "indian"
[51] "indie" "indie-pop" "industrial" "iranian" "j-dance"
[56] "j-idol" "j-pop" "j-rock" "jazz" "k-pop"
[61] "kids" "latin" "latino" "malay" "mandopop"
[66] "metal" "metalcore" "minimal-techno" "mpb" "new-age"
[71] "opera" "pagode" "party" "piano" "pop"
[76] "pop-film" "power-pop" "progressive-house" "psych-rock" "punk"
[81] "punk-rock" "r-n-b" "rock-n-roll" "rockabilly" "romance"
[86] "salsa" "samba" "sertanejo" "show-tunes" "singer-songwriter"
[91] "ska" "sleep" "spanish" "study" "swedish"
[96] "synth-pop" "tango" "trance" "trip-hop" "turkish"
[101] "world-music"
[1] "Categories=" "acoustic" "afrobeat" "alt-rock" "alternative"
[6] "ambient" "anime" "black-metal" "bluegrass" "blues"
[11] "brazil" "breakbeat" "british" "cantopop" "chicago-house"
[16] "children" "chill" "classical" "club" "comedy"
[21] "country" "dance" "dancehall" "death-metal" "detroit-techno"
[26] "disco" "disney" "drum-and-bass" "edm" "electro"
[31] "electronic" "emo" "folk" "forro" "garage"
[36] "german" "gospel" "goth" "grindcore" "groove"
[41] "grunge" "guitar" "happy" "hard-rock" "hardcore"
[46] "heavy-metal" "hip-hop" "honky-tonk" "idm" "indian"
[51] "indie" "indie-pop" "industrial" "iranian" "j-dance"
[56] "j-idol" "j-pop" "j-rock" "jazz" "k-pop"
[61] "kids" "latin" "latino" "malay" "mandopop"
[66] "metal" "metalcore" "minimal-techno" "mpb" "new-age"
[71] "opera" "pagode" "party" "piano" "pop"
[76] "pop-film" "power-pop" "progressive-house" "psych-rock" "punk"
[81] "punk-rock" "r-n-b" "rock-n-roll" "rockabilly" "romance"
[86] "salsa" "samba" "sertanejo" "show-tunes" "singer-songwriter"
[91] "ska" "sleep" "spanish" "study" "swedish"
[96] "synth-pop" "tango" "trance" "trip-hop" "turkish"
[101] "world-music"
[1] "Cross Table:"
P
major minor
acoustic 19 4
afrobeat 25 16
alt-rock 27 19
alternative 4 4
ambient 35 6
anime 12 20
black-metal 8 9
bluegrass 33 8
blues 2 0
brazil 15 4
breakbeat 22 14
british 51 20
cantopop 41 10
chicago-house 32 35
children 84 14
chill 6 0
classical 18 4
club 21 3
comedy 21 11
country 5 0
dance 2 0
dancehall 5 9
death-metal 12 9
detroit-techno 44 35
disco 3 1
disney 33 10
drum-and-bass 2 2
edm 2 1
electro 8 3
electronic 9 10
emo 20 10
folk 11 7
forro 37 13
garage 25 8
german 14 10
gospel 12 2
goth 32 15
grindcore 24 25
groove 26 8
grunge 28 16
guitar 21 10
happy 19 11
hard-rock 33 12
hardcore 6 6
heavy-metal 63 40
hip-hop 3 5
honky-tonk 106 3
idm 30 31
indian 8 1
indie 1 0
indie-pop 2 0
industrial 30 17
iranian 24 20
j-dance 14 12
j-idol 79 31
j-pop 10 2
j-rock 11 5
jazz 1 1
k-pop 19 18
kids 64 20
latin 0 1
latino 1 2
malay 22 14
mandopop 21 2
metal 10 4
metalcore 13 17
minimal-techno 5 1
mpb 10 6
new-age 47 16
opera 14 6
pagode 39 20
party 15 5
piano 22 6
pop 1 0
pop-film 2 1
power-pop 24 4
progressive-house 2 2
psych-rock 26 11
punk 4 1
punk-rock 13 4
r-n-b 11 0
rock-n-roll 28 7
rockabilly 22 7
romance 19 34
salsa 13 8
samba 20 4
sertanejo 36 8
show-tunes 16 2
singer-songwriter 8 0
ska 27 13
sleep 16 18
spanish 4 7
study 35 39
swedish 5 2
synth-pop 5 13
tango 25 20
trance 3 7
trip-hop 11 16
turkish 2 5
world-music 51 5
[1] "Distribucions condicionades a columnes:"
P acoustic afrobeat alt-rock alternative ambient anime black-metal bluegrass blues
major 0.82608696 0.60975610 0.58695652 0.50000000 0.85365854 0.37500000 0.47058824 0.80487805 1.00000000
minor 0.17391304 0.39024390 0.41304348 0.50000000 0.14634146 0.62500000 0.52941176 0.19512195 0.00000000
P brazil breakbeat british cantopop chicago-house children chill classical club
major 0.78947368 0.61111111 0.71830986 0.80392157 0.47761194 0.85714286 1.00000000 0.81818182 0.87500000
minor 0.21052632 0.38888889 0.28169014 0.19607843 0.52238806 0.14285714 0.00000000 0.18181818 0.12500000
P comedy country dance dancehall death-metal detroit-techno disco disney
major 0.65625000 1.00000000 1.00000000 0.35714286 0.57142857 0.55696203 0.75000000 0.76744186
minor 0.34375000 0.00000000 0.00000000 0.64285714 0.42857143 0.44303797 0.25000000 0.23255814
P drum-and-bass edm electro electronic emo folk forro garage german
major 0.50000000 0.66666667 0.72727273 0.47368421 0.66666667 0.61111111 0.74000000 0.75757576 0.58333333
minor 0.50000000 0.33333333 0.27272727 0.52631579 0.33333333 0.38888889 0.26000000 0.24242424 0.41666667
P gospel goth grindcore groove grunge guitar happy hard-rock hardcore
major 0.85714286 0.68085106 0.48979592 0.76470588 0.63636364 0.67741935 0.63333333 0.73333333 0.50000000
minor 0.14285714 0.31914894 0.51020408 0.23529412 0.36363636 0.32258065 0.36666667 0.26666667 0.50000000
P heavy-metal hip-hop honky-tonk idm indian indie indie-pop industrial iranian
major 0.61165049 0.37500000 0.97247706 0.49180328 0.88888889 1.00000000 1.00000000 0.63829787 0.54545455
minor 0.38834951 0.62500000 0.02752294 0.50819672 0.11111111 0.00000000 0.00000000 0.36170213 0.45454545
P j-dance j-idol j-pop j-rock jazz k-pop kids latin latino
major 0.53846154 0.71818182 0.83333333 0.68750000 0.50000000 0.51351351 0.76190476 0.00000000 0.33333333
minor 0.46153846 0.28181818 0.16666667 0.31250000 0.50000000 0.48648649 0.23809524 1.00000000 0.66666667
P malay mandopop metal metalcore minimal-techno mpb new-age opera pagode
major 0.61111111 0.91304348 0.71428571 0.43333333 0.83333333 0.62500000 0.74603175 0.70000000 0.66101695
minor 0.38888889 0.08695652 0.28571429 0.56666667 0.16666667 0.37500000 0.25396825 0.30000000 0.33898305
P party piano pop pop-film power-pop progressive-house psych-rock punk
major 0.75000000 0.78571429 1.00000000 0.66666667 0.85714286 0.50000000 0.70270270 0.80000000
minor 0.25000000 0.21428571 0.00000000 0.33333333 0.14285714 0.50000000 0.29729730 0.20000000
P punk-rock r-n-b rock-n-roll rockabilly romance salsa samba sertanejo show-tunes
major 0.76470588 1.00000000 0.80000000 0.75862069 0.35849057 0.61904762 0.83333333 0.81818182 0.88888889
minor 0.23529412 0.00000000 0.20000000 0.24137931 0.64150943 0.38095238 0.16666667 0.18181818 0.11111111
P singer-songwriter ska sleep spanish study swedish synth-pop tango
major 1.00000000 0.67500000 0.47058824 0.36363636 0.47297297 0.71428571 0.27777778 0.55555556
minor 0.00000000 0.32500000 0.52941176 0.63636364 0.52702703 0.28571429 0.72222222 0.44444444
P trance trip-hop turkish world-music
major 0.30000000 0.40740741 0.28571429 0.91071429
minor 0.70000000 0.59259259 0.71428571 0.08928571
[1] "Test Chi quadrat: "
Warning: Chi-squared approximation may be incorrect
Pearson's Chi-squared test
data: dades[, k] and as.factor(P)
X-squared = 342.97, df = 99, p-value < 2.2e-16
[1] "valorsTest:"
$rowpf
Xquali
P acoustic afrobeat alt-rock alternative ambient anime black-metal
major 0.0093966370 0.0123639960 0.0133531157 0.0019782394 0.0173095945 0.0059347181 0.0039564787
minor 0.0040899796 0.0163599182 0.0194274029 0.0040899796 0.0061349693 0.0204498978 0.0092024540
Xquali
P bluegrass blues brazil breakbeat british cantopop chicago-house
major 0.0163204748 0.0009891197 0.0074183976 0.0108803165 0.0252225519 0.0202769535 0.0158259149
minor 0.0081799591 0.0000000000 0.0040899796 0.0143149284 0.0204498978 0.0102249489 0.0357873211
Xquali
P children chill classical club comedy country dance
major 0.0415430267 0.0029673591 0.0089020772 0.0103857567 0.0103857567 0.0024727992 0.0009891197
minor 0.0143149284 0.0000000000 0.0040899796 0.0030674847 0.0112474438 0.0000000000 0.0000000000
Xquali
P dancehall death-metal detroit-techno disco disney drum-and-bass edm
major 0.0024727992 0.0059347181 0.0217606330 0.0014836795 0.0163204748 0.0009891197 0.0009891197
minor 0.0092024540 0.0092024540 0.0357873211 0.0010224949 0.0102249489 0.0020449898 0.0010224949
Xquali
P electro electronic emo folk forro garage german
major 0.0039564787 0.0044510386 0.0098911968 0.0054401583 0.0182987141 0.0123639960 0.0069238378
minor 0.0030674847 0.0102249489 0.0102249489 0.0071574642 0.0132924335 0.0081799591 0.0102249489
Xquali
P gospel goth grindcore groove grunge guitar happy
major 0.0059347181 0.0158259149 0.0118694362 0.0128585559 0.0138476756 0.0103857567 0.0093966370
minor 0.0020449898 0.0153374233 0.0255623722 0.0081799591 0.0163599182 0.0102249489 0.0112474438
Xquali
P hard-rock hardcore heavy-metal hip-hop honky-tonk idm indian
major 0.0163204748 0.0029673591 0.0311572700 0.0014836795 0.0524233432 0.0148367953 0.0039564787
minor 0.0122699387 0.0061349693 0.0408997955 0.0051124744 0.0030674847 0.0316973415 0.0010224949
Xquali
P indie indie-pop industrial iranian j-dance j-idol j-pop
major 0.0004945598 0.0009891197 0.0148367953 0.0118694362 0.0069238378 0.0390702275 0.0049455984
minor 0.0000000000 0.0000000000 0.0173824131 0.0204498978 0.0122699387 0.0316973415 0.0020449898
Xquali
P j-rock jazz k-pop kids latin latino malay
major 0.0054401583 0.0004945598 0.0093966370 0.0316518299 0.0000000000 0.0004945598 0.0108803165
minor 0.0051124744 0.0010224949 0.0184049080 0.0204498978 0.0010224949 0.0020449898 0.0143149284
Xquali
P mandopop metal metalcore minimal-techno mpb new-age opera
major 0.0103857567 0.0049455984 0.0064292779 0.0024727992 0.0049455984 0.0232443126 0.0069238378
minor 0.0020449898 0.0040899796 0.0173824131 0.0010224949 0.0061349693 0.0163599182 0.0061349693
Xquali
P pagode party piano pop pop-film power-pop progressive-house
major 0.0192878338 0.0074183976 0.0108803165 0.0004945598 0.0009891197 0.0118694362 0.0009891197
minor 0.0204498978 0.0051124744 0.0061349693 0.0000000000 0.0010224949 0.0040899796 0.0020449898
Xquali
P psych-rock punk punk-rock r-n-b rock-n-roll rockabilly romance
major 0.0128585559 0.0019782394 0.0064292779 0.0054401583 0.0138476756 0.0108803165 0.0093966370
minor 0.0112474438 0.0010224949 0.0040899796 0.0000000000 0.0071574642 0.0071574642 0.0347648262
Xquali
P salsa samba sertanejo show-tunes singer-songwriter ska sleep
major 0.0064292779 0.0098911968 0.0178041543 0.0079129575 0.0039564787 0.0133531157 0.0079129575
minor 0.0081799591 0.0040899796 0.0081799591 0.0020449898 0.0000000000 0.0132924335 0.0184049080
Xquali
P spanish study swedish synth-pop tango trance trip-hop
major 0.0019782394 0.0173095945 0.0024727992 0.0024727992 0.0123639960 0.0014836795 0.0054401583
minor 0.0071574642 0.0398773006 0.0020449898 0.0132924335 0.0204498978 0.0071574642 0.0163599182
Xquali
P turkish world-music
major 0.0009891197 0.0252225519
minor 0.0051124744 0.0051124744
$vtest
Xquali
P acoustic afrobeat alt-rock alternative ambient anime black-metal bluegrass
major 1.56202633 -0.88363567 -1.26920504 -1.05132097 2.47109352 -3.62773962 -1.79430299 1.80014769
minor -1.56202633 0.88363567 1.26920504 1.05132097 -2.47109352 3.62773962 1.79430299 -1.80014769
Xquali
P blues brazil breakbeat british cantopop chicago-house children chill
major 0.98387214 1.07721084 -0.80985627 0.80610523 1.99641506 -3.46831336 3.93256761 1.70525451
minor -0.98387214 -1.07721084 0.80985627 -0.80610523 -1.99641506 3.46831336 -3.93256761 -1.70525451
Xquali
P classical club comedy country dance dancehall death-metal detroit-techno
major 1.44804271 2.10914819 -0.21535912 1.55641737 0.98387214 -2.53515488 -1.00628889 -2.24903619
minor -1.44804271 -2.10914819 0.21535912 -1.55641737 -0.98387214 2.53515488 1.00628889 2.24903619
Xquali
P disco disney drum-and-bass edm electro electronic emo folk
major 0.32448495 1.31665478 -0.74289976 -0.02711069 0.37762453 -1.86867112 -0.08612033 -0.57092391
minor -0.32448495 -1.31665478 0.74289976 0.02711069 -0.37762453 1.86867112 0.08612033 0.57092391
Xquali
P forro garage german gospel goth grindcore groove grunge
major 1.00401409 1.02991266 -0.95139023 1.46531495 0.10099435 -2.77354083 1.13477881 -0.53654192
minor -1.00401409 -1.02991266 0.95139023 -1.46531495 -0.10099435 2.77354083 -1.13477881 0.53654192
Xquali
P guitar happy hard-rock hardcore heavy-metal hip-hop honky-tonk idm
major 0.04082647 -0.47757640 0.85555541 -1.28846152 -1.37372348 -1.80658028 6.77207948 -3.06709735
minor -0.04082647 0.47757640 -0.85555541 1.28846152 1.37372348 1.80658028 -6.77207948 3.06709735
Xquali
P indian indie indie-pop industrial iranian j-dance j-idol j-pop
major 1.37736451 0.69558666 0.98387214 -0.52629977 -1.83253689 -1.48081438 1.00719343 1.17985557
minor -1.37736451 -0.69558666 -0.98387214 0.52629977 1.83253689 1.48081438 -1.00719343 -1.17985557
Xquali
P j-rock jazz k-pop kids latin latino malay mandopop
major 0.11550911 -0.52513421 -2.09553725 1.74333225 -1.43811476 -1.25941475 -0.80985627 2.45512314
minor -0.11550911 0.52513421 2.09553725 -1.74333225 1.43811476 1.25941475 0.80985627 -2.45512314
Xquali
P metal metalcore minimal-techno mpb new-age opera pagode party
major 0.32232357 -2.82631279 0.83344750 -0.41925528 1.23271909 0.24888693 -0.21487067 0.72751565
minor -0.32232357 2.82631279 -0.83344750 0.41925528 -1.23271909 -0.24888693 0.21487067 -0.72751565
Xquali
P piano pop pop-film power-pop progressive-house psych-rock punk punk-rock
major 1.26702506 0.69558666 -0.02711069 2.07714339 -0.74289976 0.37478285 0.60156009 0.80012007
minor -1.26702506 -0.69558666 0.02711069 -2.07714339 0.74289976 -0.37478285 -0.60156009 -0.80012007
Xquali
P r-n-b rock-n-roll rockabilly romance salsa samba sertanejo show-tunes
major 2.31085590 1.59960995 0.97689101 -4.94404136 -0.53911670 1.67192841 2.05544803 1.95082482
minor -2.31085590 -1.59960995 -0.97689101 4.94404136 0.53911670 -1.67192841 -2.05544803 -1.95082482
Xquali
P singer-songwriter ska sleep spanish study swedish synth-pop tango
major 1.96971630 0.01358332 -2.54478931 -2.20001730 -3.73555408 0.22765050 -3.59702239 -1.70790649
minor -1.96971630 -0.01358332 2.54478931 2.20001730 3.73555408 -0.22765050 3.59702239 1.70790649
Xquali
P trance trip-hop turkish world-music
major -2.52730634 -2.96861847 -2.19416333 3.81479723
minor 2.52730634 2.96861847 2.19416333 -3.81479723
$pval
Xquali
P acoustic afrobeat alt-rock alternative ambient anime black-metal
major 5.914089e-02 1.884465e-01 1.021840e-01 1.465556e-01 6.735029e-03 1.429567e-04 3.638241e-02
minor 5.914089e-02 1.884465e-01 1.021840e-01 1.465556e-01 6.735029e-03 1.429567e-04 3.638241e-02
Xquali
P bluegrass blues brazil breakbeat british cantopop chicago-house
major 3.591866e-02 1.625892e-01 1.406930e-01 2.090114e-01 2.100911e-01 2.294438e-02 2.618681e-04
minor 3.591866e-02 1.625892e-01 1.406930e-01 2.090114e-01 2.100911e-01 2.294438e-02 2.618681e-04
Xquali
P children chill classical club comedy country dance
major 4.202167e-05 4.407348e-02 7.380255e-02 1.746590e-02 4.147437e-01 5.980444e-02 1.625892e-01
minor 4.202167e-05 4.407348e-02 7.380255e-02 1.746590e-02 4.147437e-01 5.980444e-02 1.625892e-01
Xquali
P dancehall death-metal detroit-techno disco disney drum-and-bass edm
major 5.619881e-03 1.571383e-01 1.225510e-02 3.727855e-01 9.397718e-02 2.287712e-01 4.891857e-01
minor 5.619881e-03 1.571383e-01 1.225510e-02 3.727855e-01 9.397718e-02 2.287712e-01 4.891857e-01
Xquali
P electro electronic emo folk forro garage german
major 3.528548e-01 3.083429e-02 4.656854e-01 2.840256e-01 1.576859e-01 1.515255e-01 1.707032e-01
minor 3.528548e-01 3.083429e-02 4.656854e-01 2.840256e-01 1.576859e-01 1.515255e-01 1.707032e-01
Xquali
P gospel goth grindcore groove grunge guitar happy
major 7.141750e-02 4.597775e-01 2.772494e-03 1.282340e-01 2.957920e-01 4.837171e-01 3.164759e-01
minor 7.141750e-02 4.597775e-01 2.772494e-03 1.282340e-01 2.957920e-01 4.837171e-01 3.164759e-01
Xquali
P hard-rock hardcore heavy-metal hip-hop honky-tonk idm indian
major 1.961219e-01 9.879268e-02 8.476377e-02 3.541387e-02 6.347218e-12 1.080742e-03 8.419979e-02
minor 1.961219e-01 9.879268e-02 8.476377e-02 3.541387e-02 6.347256e-12 1.080742e-03 8.419979e-02
Xquali
P indie indie-pop industrial iranian j-dance j-idol j-pop
major 2.433439e-01 1.625892e-01 2.993400e-01 3.343574e-02 6.932802e-02 1.569209e-01 1.190288e-01
minor 2.433439e-01 1.625892e-01 2.993400e-01 3.343574e-02 6.932802e-02 1.569209e-01 1.190288e-01
Xquali
P j-rock jazz k-pop kids latin latino malay
major 4.540208e-01 2.997449e-01 1.806163e-02 4.063780e-02 7.520075e-02 1.039403e-01 2.090114e-01
minor 4.540208e-01 2.997449e-01 1.806163e-02 4.063780e-02 7.520075e-02 1.039403e-01 2.090114e-01
Xquali
P mandopop metal metalcore minimal-techno mpb new-age opera
major 7.041817e-03 3.736038e-01 2.354363e-03 2.022962e-01 3.375148e-01 1.088403e-01 4.017241e-01
minor 7.041817e-03 3.736038e-01 2.354363e-03 2.022962e-01 3.375148e-01 1.088403e-01 4.017241e-01
Xquali
P pagode party piano pop pop-film power-pop progressive-house
major 4.149341e-01 2.334551e-01 1.025732e-01 2.433439e-01 4.891857e-01 1.889416e-02 2.287712e-01
minor 4.149341e-01 2.334551e-01 1.025732e-01 2.433439e-01 4.891857e-01 1.889416e-02 2.287712e-01
Xquali
P psych-rock punk punk-rock r-n-b rock-n-roll rockabilly romance
major 3.539110e-01 2.737335e-01 2.118206e-01 1.042041e-02 5.484257e-02 1.643116e-01 3.825973e-07
minor 3.539110e-01 2.737335e-01 2.118206e-01 1.042041e-02 5.484257e-02 1.643116e-01 3.825973e-07
Xquali
P salsa samba sertanejo show-tunes singer-songwriter ska sleep
major 2.949032e-01 4.726922e-02 1.991788e-02 2.553894e-02 2.443545e-02 4.945812e-01 5.467185e-03
minor 2.949032e-01 4.726922e-02 1.991788e-02 2.553894e-02 2.443545e-02 4.945812e-01 5.467185e-03
Xquali
P spanish study swedish synth-pop tango trance trip-hop
major 1.390283e-02 9.365116e-05 4.099590e-01 1.609404e-04 4.382685e-02 5.747060e-03 1.495709e-03
minor 1.390283e-02 9.365116e-05 4.099590e-01 1.609404e-04 4.382685e-02 5.747060e-03 1.495709e-03
Xquali
P turkish world-music
major 1.411183e-02 6.814740e-05
minor 1.411183e-02 6.814740e-05
[1] "Variable multiple_artists"
[1] "Categories=" "FALSE" "TRUE"
[1] "Categories=" "FALSE" "TRUE"
[1] "Cross Table:"
P
major minor
FALSE 1965 962
TRUE 57 16
[1] "Distribucions condicionades a columnes:"
P FALSE TRUE
major 0.6713358 0.7808219
minor 0.3286642 0.2191781
[1] "Test Chi quadrat: "
Pearson's Chi-squared test with Yates' continuity correction
data: dades[, k] and as.factor(P)
X-squared = 3.4033, df = 1, p-value = 0.06506
[1] "valorsTest:"
$rowpf
Xquali
P FALSE TRUE
major 0.97181009 0.02818991
minor 0.98364008 0.01635992
$vtest
Xquali
P FALSE TRUE
major -1.971207 1.971207
minor 1.971207 -1.971207
$pval
Xquali
P FALSE TRUE
major 0.02435008 0.02435008
minor 0.02435008 0.02435008
[1] "Variable tempo_cat"
[1] "Categories=" "Larghissimo" "Grave" "Lento/Largo" "Larghetto" "Adagio" "Andante"
[8] "Moderato" "Allegro" "Vivace" "Presto" "Prestissimo"
[1] "Categories=" "Larghissimo" "Grave" "Lento/Largo" "Larghetto" "Adagio" "Andante"
[8] "Moderato" "Allegro" "Vivace" "Presto" "Prestissimo"
[1] "Cross Table:"
P
major minor
Larghissimo 0 0
Grave 0 0
Lento/Largo 9 1
Larghetto 13 10
Adagio 63 33
Andante 551 307
Moderato 301 121
Allegro 908 427
Vivace 75 44
Presto 86 30
Prestissimo 14 3
[1] "Distribucions condicionades a columnes:"
P Larghissimo Grave Lento/Largo Larghetto Adagio Andante Moderato Allegro Vivace Presto
major 0.9000000 0.5652174 0.6562500 0.6421911 0.7132701 0.6801498 0.6302521 0.7413793
minor 0.1000000 0.4347826 0.3437500 0.3578089 0.2867299 0.3198502 0.3697479 0.2586207
P Prestissimo
major 0.8235294
minor 0.1764706
[1] "Test Chi quadrat: "
Warning: Chi-squared approximation may be incorrect
Pearson's Chi-squared test
data: dades[, k] and as.factor(P)
X-squared = 16.012, df = 8, p-value = 0.04221
[1] "valorsTest:"
$rowpf
Xquali
P Larghissimo Grave Lento/Largo Larghetto Adagio Andante Moderato Allegro
major 0.000000000 0.000000000 0.004455446 0.006435644 0.031188119 0.272772277 0.149009901 0.449504950
minor 0.000000000 0.000000000 0.001024590 0.010245902 0.033811475 0.314549180 0.123975410 0.437500000
Xquali
P Vivace Presto Prestissimo
major 0.037128713 0.042574257 0.006930693
minor 0.045081967 0.030737705 0.003073770
$vtest
Xquali
P Larghissimo Grave Lento/Largo Larghetto Adagio Andante Moderato Allegro Vivace
major 0.0000000 0.0000000 1.5259103 -1.1198620 -0.3821151 -2.3706107 1.8460768 0.6195929 -1.0446552
minor 0.0000000 0.0000000 -1.5259103 1.1198620 0.3821151 2.3706107 -1.8460768 -0.6195929 1.0446552
Xquali
P Presto Prestissimo
major 1.5738795 1.3172030
minor -1.5738795 -1.3172030
$pval
Xquali
P Larghissimo Grave Lento/Largo Larghetto Adagio Andante Moderato Allegro
major 0.500000000 0.500000000 0.063516101 0.131386282 0.351187992 0.008879362 0.032440526 0.267762926
minor 0.500000000 0.500000000 0.063516101 0.131386282 0.351187992 0.008879362 0.032440526 0.267762926
Xquali
P Vivace Presto Prestissimo
major 0.148091186 0.057757651 0.093885296
minor 0.148091186 0.057757651 0.093885296
#descriptors de les classes més significatius. Afegir info qualits
for (c in 1:length(levels(as.factor(P)))) {
if(!is.na(levels(as.factor(P))[c])){
print(paste("P.values per class:",levels(as.factor(P))[c]));
print(sort(pvalk[c,]), digits=3)
}
}
[1] "P.values per class: major"
explicit key mode time_signature track_genre multiple_artists
0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00
tempo_cat instrumentalness energy acousticness speechiness duration_ms
0.00e+00 3.99e-12 1.12e-06 3.58e-06 6.72e-04 9.37e-04
valence tempo popularity liveness danceability loudness
4.42e-03 4.59e-03 1.76e-02 2.80e-01 4.59e-01 4.85e-01
[1] "P.values per class: minor"
explicit key mode time_signature track_genre multiple_artists
0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00
tempo_cat instrumentalness energy acousticness speechiness duration_ms
0.00e+00 3.99e-12 1.12e-06 3.58e-06 6.72e-04 9.37e-04
valence tempo popularity liveness danceability loudness
4.42e-03 4.59e-03 1.76e-02 2.80e-01 4.59e-01 4.85e-01
#afegir la informacio de les modalitats de les qualitatives a la llista de pvalues i fer ordenacio global
#saving the dataframe in an external file
#write.table(dd, file = "credscoClean.csv", sep = ";", na = "NA", dec = ".", row.names = FALSE, col.names = TRUE)
Findings:
From the boxplot of valence vs mode, we can see that minor songs tend to have lower valence (sadder mood) than major songs. Similarly, songs in the minor key tend to have lower tempo compared to songs in the major key, although there are outliers.
Interestingly, many songs in the major key are in the key of G, C, D and A. Whereas the most popular key for minor songs are A, B and E.
For popularity, songs in the minor key have a wider range compared to major songs.
Genre vs valence.
library(dplyr)
library(forcats)
plotdata <- dd %>%
group_by(track_genre) %>%
summarize(mean_valence = mean(valence))
# plot mean salaries
ggplot(plotdata,
aes(x = fct_reorder(track_genre, mean_valence),
y = mean_valence)) +
geom_bar(stat = "identity") +
scale_x_discrete(guide = guide_axis(angle = 90)) +
xlab("Genre") + ylab("Mean Valence")
One genre stood out when looking at highest valence: r&b. The Sleep genre has the lowest mean valence.
Next we can examine the relationship between genre and energy, by calculating the mean.
plotdata2 <- dd %>%
group_by(track_genre) %>%
summarize(mean_energy = mean(energy))
# plot mean salaries
ggplot(plotdata2,
aes(x = fct_reorder(track_genre, mean_energy),
y = mean_energy)) +
geom_bar(stat = "identity") +
scale_x_discrete(guide = guide_axis(angle = 90))
Classical songs have the least mean energy, and drum-and-bass songs have the highest mean energy.
We can also examine the relationship between genre and danceability
plotdata3 <- dd %>%
group_by(track_genre) %>%
summarize(mean_danceability = mean(danceability))
# plot mean salaries
ggplot(plotdata3,
aes(x = fct_reorder(track_genre, mean_danceability),
y = mean_danceability)) +
geom_bar(stat = "identity") +
scale_x_discrete(guide = guide_axis(angle = 90))
plotdata2 <- dd %>%
group_by(track_genre) %>%
summarize(mean_energy = mean(energy))
# plot mean salaries
ggplot(plotdata2,
aes(x = fct_reorder(track_genre, mean_energy),
y = mean_energy)) +
geom_bar(stat = "identity") +
scale_x_discrete(guide = guide_axis(angle = 90))
ggplot(dd,
aes(x = mode,
y = valence)) +
geom_boxplot() +
labs(title = "Valence distribution by mode")
Energy vs mode
ggplot(dd,
aes(x = mode,
y = energy)) +
geom_boxplot() +
labs(title = "Energy distribution by mode")
Minor songs actually have a higher range of energy compared to major songs, which is interesting because one would think that there would be more happy songs (typically major key) with higher energy. But this could be because many of the latin songs are in minor key.
ggplot(dd,
aes(x = mode,
y = acousticness)) +
geom_boxplot() +
labs(title = "Acousticness distribution by mode")
Looking at valence vs explicitness, we see that songs that are explicit tend to have lower valence than songs that are clean.
ggplot(dd,
aes(x = explicit,
y = valence)) +
geom_boxplot() +
labs(title = "Valence distribution by explicit") +
scale_x_discrete(guide = guide_axis(angle = 90))
Looking at valence vs explicitness, we see that songs that are explicit tend to have lower valence than songs that are clean.
ggplot(dd,
aes(x = explicit,
y = valence)) +
geom_boxplot() +
labs(title = "Valence distribution by explicit") +
scale_x_discrete(guide = guide_axis(angle = 90))
numerical_only <- dd %>% select(1:2, 4:5, 7, 9:14)
pairs(numerical_only)
panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r <- abs(cor(x, y))
txt <- format(c(r, 0.123456789), digits = digits)[1]
txt <- paste0(prefix, txt)
if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex = cex.cor * r)
}
pairs(numerical_only, lower.panel = panel.smooth, upper.panel = panel.cor,
gap=0, row1attop=FALSE)